K-Nearest Neighbor Search by Random Projection Forests

نویسندگان

چکیده

K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology ensemble methods over last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors combining multiple kNN-sensitive trees with each constructed recursively through series projections. As demonstrated experiments on wide collection real datasets, our achieves remarkable accuracy terms fast decaying missing rate kNNs that discrepancy k-th distances. has very low computational complexity as methodology. The nature makes it easily parallelized to run clustered or multicore computers; running time expected be nearly inversely proportional number cores machines. We give theoretical insights showing exponential decay neighboring points being separated when size increases. Our theory can also used refine choice projections growth rpForests; show effect remarkable.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph

We introduce a new nearest neighbor search algorithm. The algorithm builds a nearest neighbor graph in an offline phase and when queried with a new point, performs hill-climbing starting from a randomly sampled node of the graph. We provide theoretical guarantees for the accuracy and the computational complexity and empirically show the effectiveness of this algorithm.

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests

We propose a new data-structure, the generalized randomized k -d forest, or k -d GeRaF, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations, thus ac...

متن کامل

Evaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests

Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Big Data

سال: 2021

ISSN: ['2372-2096', '2332-7790']

DOI: https://doi.org/10.1109/tbdata.2019.2908178